13 research outputs found

    Selection Bias in News Coverage: Learning it, Fighting it

    Get PDF
    News entities must select and filter the coverage they broadcast through their respective channels since the set of world events is too large to be treated exhaustively. The subjective nature of this filtering induces biases due to, among other things, resource constraints, editorial guidelines, ideological affinities, or even the fragmented nature of the information at a journalist's disposal. The magnitude and direction of these biases are, however, widely unknown. The absence of ground truth, the sheer size of the event space, or the lack of an exhaustive set of absolute features to measure make it difficult to observe the bias directly, to characterize the leaning's nature and to factor it out to ensure a neutral coverage of the news. In this work, we introduce a methodology to capture the latent structure of media's decision process on a large scale. Our contribution is multi-fold. First, we show media coverage to be predictable using personalization techniques, and evaluate our approach on a large set of events collected from the GDELT database. We then show that a personalized and parametrized approach not only exhibits higher accuracy in coverage prediction, but also provides an interpretable representation of the selection bias. Last, we propose a method able to select a set of sources by leveraging the latent representation. These selected sources provide a more diverse and egalitarian coverage, all while retaining the most actively covered events

    A Dynamic Embedding Model of the Media Landscape

    Full text link
    Information about world events is disseminated through a wide variety of news channels, each with specific considerations in the choice of their reporting. Although the multiplicity of these outlets should ensure a variety of viewpoints, recent reports suggest that the rising concentration of media ownership may void this assumption. This observation motivates the study of the impact of ownership on the global media landscape and its influence on the coverage the actual viewer receives. To this end, the selection of reported events has been shown to be informative about the high-level structure of the news ecosystem. However, existing methods only provide a static view into an inherently dynamic system, providing underperforming statistical models and hindering our understanding of the media landscape as a whole. In this work, we present a dynamic embedding method that learns to capture the decision process of individual news sources in their selection of reported events while also enabling the systematic detection of large-scale transformations in the media landscape over prolonged periods of time. In an experiment covering over 580M real-world event mentions, we show our approach to outperform static embedding methods in predictive terms. We demonstrate the potential of the method for news monitoring applications and investigative journalism by shedding light on important changes in programming induced by mergers and acquisitions, policy changes, or network-wide content diffusion. These findings offer evidence of strong content convergence trends inside large broadcasting groups, influencing the news ecosystem in a time of increasing media ownership concentration

    Nanomechanical damping via electron-assisted relaxation of two-level systems

    Full text link
    We report on measurements of dissipation and frequency noise at millikelvin temperatures of nanomechanical devices covered with aluminum. A clear excess damping is observed after switching the metallic layer from superconducting to the normal state with a magnetic field. Beyond the standard model of internal tunneling systems coupled to the phonon bath, here we consider the relaxation to the conduction electrons together with the nature of the mechanical dispersion laws for stressed/unstressed devices. With these key ingredients, a model describing the relaxation of two-level systems inside the structure due to interactions with electrons and phonons with well separated timescales captures the data. In addition, we measure an excess 1/f-type frequency noise in the normal state, which further emphasizes the impact of conduction electrons

    Learning Representations of Source Code from Structure and Context

    No full text
    Large codebases are routinely indexed by standard Information Retrieval systems, starting from the assumption that code written by humans shows similar statistical properties to written text [Hindle et al., 2012]. While those IR systems are still relatively successful inside companies to help developers search on their proprietary codebase, the same cannot be said about most of public platforms: throughout the years many notable names (Google Code Search, Koders, Ohloh, etc.) have been shut down. The limited functionalities offered, combined with the low quality of the results, did not attract a critical mass of users to justify running those services. To this date, even GitHub (arguably the largest code repository in the world) offers search functionalities that are not more innovative than those present in platforms from the past decade. We argue that the reason why this happens has happened can be imputed to the fundamental limitation of mining information exclusively from the textual representation of the code. Developing a more powerful representation of code will not only enable a new generation of search systems, but will also allow us to explore code by functional similarity, i.e., searching for blocks of code which accomplish similar (and not strictly equivalent) tasks. In this thesis, we want to explore the opportunities provided by a multimodal representation of code: (1) hierarchical (both in terms of object and package hierarchy), (2) syntactical (leveraging the Abstract Syntax Tree representation of code), (3) distributional (embedding by means of co-occurrences), and (4) textual (mining the code documentation). Our goal is to distill as much information as possible from the complex nature of code. Recent advances in deep learning are providing a new set of techniques that we plan to employ for the different modes, for instance PoincarĂ© Embeddings [Nickel and Kiela, 2017] for (1) hierarchical, and Gated Graph NNs [Li et al., 2016] for (2) syntactical. Last but not the least, learning multimodal similarity [McFee and Lanckriet, 2011] is an ulterior research challenge, especially at the scale of large codebases – we will explore the opportunities offered by a framework like GraphSAGE [Hamilton et al., 2017] to harmonize a large graph with rich feature information

    Using holistic event information in the trigger

    No full text
    In order to achieve the data rates proposed for the future Run 3 upgrade of the LHCb detector, new processing models must be developed to deal with the increased throughput. For this reason we aim to investigate the feasibility of purely data-driven 'holistic' methods, with the constraint of introducing minimal computational overhead, hence using only raw detector information. These filters should be unbiased - having a neutral effect with respect to the studied physics channels. In particular, the use of machine learning based methods seems particularly suitable, potentially providing a natural formulation for heuristic-free, unbiased filters whose objective would be to optimize between throughput and bandwidth.In order to achieve the data rates proposed for the future Run 3 upgrade of the LHCb detector, new processing models must be developed to deal with the increased throughput. For this reason, we aim to investigate the feasibility of purely data-driven holistic methods, with the constraint of introducing minimal computational overhead, hence using only raw detector information. These filters should be unbiased - having a neutral effect with respect to the studied physics channels. In particular, the use of machine learning based methods seems particularly suitable, potentially providing a natural formulation for heuristic-free, unbiased filters whose objective would be to optimize between throughput and bandwidth

    New approaches for track reconstruction in LHCb’s Vertex Locator

    No full text
    Starting with Upgrade 1 in 2021, LHCb will move to a purely software-based trigger system. Therefore, the new trigger strategy is to process events at the full rate of 30MHz. Given that the increase of CPU performance has slowed down in recent years, the predicted performance of the software trigger currently falls short of the necessary 30MHz throughput. To cope with this shortfall, the Upgrade High Level Trigger currently uses impact parameter cuts on VELO tracks to reduce the amount of tracks which require further processing. The presented work introduces an alternative approach to determine the impact parameter while achieving a more reliable uncertainty prediction than the current method, which could potentially improve the physics quality of these cuts at a later time

    New approaches for track reconstruction in LHCb’s Vertex Locator

    Get PDF
    Starting with Upgrade 1 in 2021, LHCb will move to a purely software-based trigger system. Therefore, the new trigger strategy is to process events at the full rate of 30MHz. Given that the increase of CPU performance has slowed down in recent years, the predicted performance of the software trigger currently falls short of the necessary 30MHz throughput. To cope with this shortfall, the Upgrade High Level Trigger currently uses impact parameter cuts on VELO tracks to reduce the amount of tracks which require further processing. The presented work introduces an alternative approach to determine the impact parameter while achieving a more reliable uncertainty prediction than the current method, which could potentially improve the physics quality of these cuts at a later time

    Nanomechanical damping via electron-assisted relaxation of two-level systems

    No full text
    We report on measurements of dissipation and frequency noise at millikelvin temperatures of nanomechanical devices covered with aluminum. A clear excess damping is observed after switching the metallic layer from superconducting to the normal state with a magnetic field. Beyond the standard model of internal tunneling systems coupled to the phonon bath, here we consider the relaxation to the conduction electrons together with the nature of the mechanical dispersion laws for stressed/unstressed devices. With these key ingredients, a model describing the relaxation of two-level systems inside the structure due to interactions with electrons and phonons with well separated timescales captures the data. In addition, we measure an excess 1/f-type frequency noise in the normal state, which further emphasizes the impact of conduction electrons
    corecore